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Chapter 1 

Ergodic properties of a class 
of non-Markovian processes 

Abstract 

We study a fairly general class of time-homogeneous stochastic evolutions 
driven by noises that are not white in time. As a consequence, the result- 
ing processes do not have the Markov property. In this setting, we obtain 
constructive criteria for the uniqueness of stationary solutions that are very 
close in spirit to the existing criteria for Markov processes. 

In the case of discrete time, where the driving noise consists of a sta- 
tionary sequence of Gaussian random variables, we give optimal conditions 
on the spectral measure for our criteria to be applicable. In particular, we 
show that under a certain assumption on the spectral density, our assump- 
tions can be checked in virtually the same way as one would check that the 
Markov process obtained by replacing the driving sequence by a sequence 
of independent identically distributed Gaussian random variables is strong 
Feller and topologically irreducible. The results of the present article extend 
those obtained previously in the continuous time context of diffusions driven 
by fractional Brownian motion. 

1.1 Introduction 

Stochastic processes have been used as a powerful modelling tool for decades 
in situations where the evolution of a system has some random component, 
be it intrinsic or to model the interaction with a complex environment. In 
its most general form, a stochastic process describes the evolution X(t,u) 
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of a system, where t denotes the time parameter and to takes values in 
some probability space and abstracts the 'element of chance' describing the 
randomness of the process. 

In many situations of interest, the evolution of the system can be de- 
scribed (at least informally) by the solutions of an evolution equation of the 
type 



where £ is the 'noise' responsible for the randomness in the evolution. In the 
present article, we will not be interested in the technical subtleties arising 
from the fact that the time parameter t in (1.1a) takes continuous values. 
We will therefore consider its discrete analogue 



were £ n describes the noise acting on the system between times n and n + 
1. Note that (1.1a) can always be reduced to (1.1b) by allowing x n to 
represent not just the state of the system at time n, but its evolution over 
the whole time interval [n — 1, n] . We were intentionally vague about the 
precise meaning of the symbol x in the right hand side of (1.1) in order to 
suggest that there are situations where it makes sense to let the right hand- 
side depend not only on the current state of the system, but on the whole 
collection of its past states as well. 

The process x n defined by a recursion of the type (1.1b) has the Markov 
property if both of the following properties hold: 

a. The noises {£ n }nez are mutually independent. 

b. For a fixed value of £, the function x i— > F(x,£) depends only on the 
last state of the system. 

In this article, we will be interested in the study of recursion relations of the 
type (1.1b) when condition b. still holds, but the Markov property is lost 
because condition a. fails to hold. Our main focus will be on the ergodic 
properties of (1.1b), with the aim of providing concrete conditions that en- 
sure the uniqueness (in law) of a stationary sequence of random variables 
x n satisfying a given recursion of the type (1.1b). 

Many such criteria exist for Markov processes and we refer to (Meyn 
&; Tweedie 1993) for a comprehensive overview of the techniques developed 
in this regard over the past seven decades. The aim of the present article 
will be to present a framework in which recursions of the type (1.1b) can be 




(1.1a) 



X n +l = F(x,£ n ) 



(1.1b) 
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studied and such that several existing ergodicity results for Markov processes 
have natural equivalents whose assumptions can also be checked in similar 
ways. This framework (which should be considered as nothing but a different 
way of looking at random dynamical systems, together with some slightly 
more restrictive topological assumptions) was developed in (Hairer 2005) and 
further studied in (Hairer & Ohashi 2007) in order to treat the ergodicity of 
stochastic differential equations driven by fractional Brownian motion. The 
main novelty of the present article is to relax a number of assumptions from 
the previous works and to include a detailed study of the discrete-time case 
when the driving noise is Gaussian. 

The remainder of this article is organised as follows. After introducing 
our notations at the end of this section, we will introduce in Section 1.2 the 
framework studied in the present article. We then proceed in Section 1.3 to 
a comparison of this framework with that of random dynamical systems. In 
Section 1.4 we recall a few general ergodicity criteria for Markov processes 
and give a very similar criterion that can be applied in our framework. In 
Section 1.5 we finally study in detail the case of a system driven by a (time- 
discrete) stationary sequence of Gaussian random variables. We derive an 
explicit condition on the spectral measure of the sequence that ensures that 
such a system behaves qualitatively like the same system driven by an i.i.d. 
sequence of Gaussian random variables. 



1.1.1 Notations 

The following notations will be used throughout this article. Unless stated 
otherwise, measures will always be Borel measures over Polish (i.e. metris- 
able, complete, separable) spaces and they will always be positive. We write 
^ + (X) for the set of all such measures on the space X and jjt\{X) for the 
subset of all probability measures. We write fi « v to indicate that \i and 
v are equivalent (i.e. they are mutually absolutely continuous, that is they 
have the same negligible sets) and fx _L v to indicate that they are mutually 
singular. 

Given a map /: X — > y and a measure fx on X, we denote by the 
push-forward measure p/^ 1 on y. Given a product space X x y, we will 
use the notation Tlx and TLy to denote the projections onto the two factors. 
For infinite products like X z , X z ~ or <Y N , we denote by Ti n the projection 
onto the nth factor (n can be negative in the first two cases). 

We will also make use of the concatenation operator U from X 7 *- x X n 
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to X z - defined in the natural way by 



(w U w') k = 



w 



W k +n 



k+n 



if k < —n, 
otherwise. 



Finally, given a Markov transition probability V: X — > ^i(X), we will 
use the same symbol for the associated Markov operator acting on observ- 
ables <j>: X — > R by (V(f))(x) = f x 4>(y) V(x, dy), and the dual operator acting 
on probability measures jj, by (Pfi)(A) = J x V(x,A) [i(dx). 
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1.2 Skew-products 

Whatever stochastic process X one may wish to consider, it is always pos- 
sible to turn it into a Markov process by adding sufficiently many 'hidden' 
degrees of freedom to the state space. For example, one can take the state 
space large enough to contain all possible information about the past of X, 
as well as all possible information on the future of the driving noise £. The 
evolution (1.1) is then deterministic, with all randomness injected once and 
for all by drawing £ initially according to the appropriate distribution. This 
is the point of view of random dynamical systems explained in more detail 
in Section 1.3 below. 

On the other hand, one could take a somewhat smaller 'noise space' 
that contains only information about the past of the driving noise £. In 
this case, the evolution is no longer deterministic, but it becomes a skew- 
product between a Markovian evolution for the noise (with the transition 
probabilities given informally by the conditional distribution of the 'future' 
given the 'past') and a deterministic map that solves (1.1). This is the 
viewpoint that was developed in (Hairer 2005, Hairer & Ohashi 2007) and 
will be studied further in this article. 

The framework that will be considered here is the following. Let W and 
X be two Polish spaces that will be called the 'noise space W and the 'state 
space X' respectively, let V be a Markov transition kernel on W, and let 
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x X —> X be an 'evolution map'. Throughout this article, we will make 
the following standing assumptions: 

1. There exists a probability measure P on W which is invariant for V and 
such that the law of the corresponding stationary process is ergodic 
under the shift map. 

2. The map <3>: W x X — > X is continuous in the product topology. 

We will also occasionally impose some regularity of the kernel V(w,-) as 
a function of w. We therefore state the following property which will not 
always be assumed to hold: 

3. The transition kernel V is Feller, that is the function V<j) defined by 
(V4>)(w) = J w (p(w')V(w, dw') is continuous as soon as 4> is continuous. 

There are two objects that come with a construction such as the one 
above. First, we can define a Markov transition operator Q on X x W by 



In words, we first draw an element w' from the noise space according to 
the law V(w, ■ ) and we then update the state of the system with that noise 
according to <£. We also introduce a 'solution map' S: X x W — > ^i(X^) 
that takes as arguments an initial condition x € X and an 'initial noise' 
w and returns the law of the corresponding solution process, that is the 
marginal on X of the law of the Markov process starting at (x, w) with 
transition probabilities Q. 

The point of view that we take in this article is that S encodes all 
the 'physically relevant' part of the evolution (1.1), and that the particular 
choice of noise space is just a mathematical tool. This motivates the intro- 
duction of an equivalence relation between probability measures on X x W 



(Here, we used the shorthand Sfi = J S(x,w) /x(dx , dw) .) In the remainder 
of this article, when we will be looking for criteria that ensure the uniqueness 
of the invariant measure for Q, this will always be understood to hold up to 
the equivalence relation (1.3). 

Remark 1.2.1. The word 'skew-product' is sometimes used in a slightly 
different way in the literature. In our framework, given a realisation of 




(1.2) 



by 



(1.3) 
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the noise, that is a realisation of a Markov process on W with transition 
probabilities V , the evolution in X is purely deterministic. This is differ- 
ent from, for example, the skew-product decomposition of Brownian motion 
where, given one realisation of the evolution of the radial part, the evolution 
of the angular part is still random. 

1.2.1 Admissible measures 

We consider the invariant measure P for the noise process to be fixed. There- 
fore, we will usually consider measures on X x W such that their projections 
on W are equal to P. Let us call such probability measures admissible and let 
us denote the set of admissible probability measures by ^p(X). Obviously, 
the Markov operator Q maps the set of admissible probability measures into 
itself. Since we assumed that W is a Polish space, it is natural to endow 
^p(X) with the topology of weak convergence. This topology is preserved 
by Q if we assume that V is Feller: 

Lemma 1.2.2. // <I> is continuous and V is Feller, then the Markov tran- 
sition operator Q is also Feller and therefore continuous in the topology of 
weak convergence on X x W. □ 

The proof of this result is straightforward and of no particular interest, 
so we leave it as an exercise. 

There are however cases of interest in which we do not wish to assume 
that V is Feller. In this natural topology for the space ^p(X) is 

given by the 'narrow topology', see (Valadier 1990, Crauel 20026). In order 
to define this topology, denote by Cp{X ) the set of functions <p: X x W — > R 
such that x i-> <j>(x, w) is bounded and continuous for every w <G W, w i-> 
4>{x,w) is measurable for every x G X, and J w sup^g^ \<p(x, w)\ P(dw) < 
oo. The narrow topology on ^p(X) is then the coarsest topology such 
that the map //•—>/ <fi(x,w) [x(dx,dw) is continuous for every 4> G Cp(X). 
Using Lebesgue's dominated convergence theorem, it is straightforward to 
show that Q is continuous in the narrow topology without requiring any 
assumption besides the continuity of 

An admissible probability measure \i is now called an invariant measure 
for the skew-product (W, P, V, X, $) if it is an invariant measure for Q, 
that is if Qfi = fi. We call it a stationary measure if Qfi ~ fx, that is if the 
law of the ^-component of the Markov process with transition probabilities 
Q starting from fi is stationary. Using the standard Krylov-Bogoliubov 
argument, one shows that 
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Lemma 1.2.3. Given any stationary measure \i as defined above, there 
exists an invariant measure /t such that jl ~ (i. 

Proof. Define a sequence of probability measures /U/v on X x W by /in = 
X^n=i Q'V- Since, for every N, the marginal of ^i^v on W is equal to P 
and the marginal on X is equal to the marginal of fi on <Y (by stationarity) , 
this sequence is tight in the narrow topology (Crauel 20026). It therefore 
has at least one accumulation point jl and the continuity of Q in the narrow 
topology ensures that jl is indeed an invariant measure for Q. □ 

The aim of this article is to present some criteria that allow to show 
the uniqueness up to the equivalence relation (1.3) of the invariant measure 
for a given skew-product. The philosophy that we will pursue is not to 
apply existing criteria to the Markov semigroup Q. This is because, in typ- 
ical situations like a random differential equation driven by some stationary 
Gaussian process, the noise space W is very 'large' and so the Markov op- 
erators V and Q typically do not have any of the 'nice' properties (strong 
Feller property, ^-irreducibility, etc.) that are often required in the ergodic 
theory of Markov processes. 

1.2.2 A simple example 

In this section, we give a simple example that illustrates the fact that it is 
possible in some situations to have non-uniqueness of the invariant measure 
for Q, even though P is ergodic and one has uniqueness up to the equivalence 
relation (1.3). Take W = {0, 1} Z - and define the 'concatenation and shift' 
map 6:W x {0,1} -> W by 



Fix p G (0, 1), let £ be a random variable that takes the values and 1 with 
probabilities p and 1 — p respectively, and define the transition probabilities 



We then take as our state space X = {0, 1} and we define an evolution $ by 




for n < 0, 
for n = 0. 



7>by 




(1.4) 



It is clear that there are two extremal invariant measures for this evolution. 
One of them charges the set of pairs (w,x) such that x = wq, the other 
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one charges the set of pairs such that x = 1 — w^. (The projection of these 
measures onto W is Bernoulli with parameter p in both cases.) However, if 
p = \i both invariant measures give raise to the same stationary process in 
X, which is just a sequence of independent Bernoulli random variables. 

1.2.3 An important special case 

In most of the remainder of this article, we are going to focus on the following 
particular case of the setup described above. Suppose that there exists a 
Polish space Wo which carries all the information that is needed in order to 
reconstruct the dynamic of the system from one time-step to the next. We 
then take W of the form W = W^ - (with the product topology) and we 
assume that <3? is of the form &(x, w) = &o(%, wq) for some jointly continuous 
function <&$:X x Wo — > X. Here, we use the notation w = (. . . ,w-\,wq) 
for elements of W. Concerning the transition probabilities V, we fix a Borel 
measure P on W which is invariant and ergodic for the shift map 1 (8w)„ = 
w n —i, and we define a measurable map V:W — > ^#i(Wo) as the regular 
conditional probabilities of P under the splitting W ~ W x Wo- 

The transition probabilities V(w,-) are then constructed as the push- 
forward of V(w,-) under the concatenation map f w :Wo — ► W given by 
fw( w ') = w\J w' . Since we assumed that P is shift-invariant, it follows from 
the construction that it is automatically invariant for V. 

Many natural situations fall under this setup, even if they do not look 
so at first sight. For example, in the case of the example from the previous 
section, one would be tempted to take Wo = {0, 1}. This does not work 
since the function (f> defined in (1.4) depends not only on u>o but also on 
w-i. However, one can choose Wo = {0, l} 2 and identify W with the subset 
of all sequences {w n } n <o in W^ - such that w\ = w l n+l for every n < 0. 

1.3 Skew-products of Markov processes versus ran- 
dom dynamical systems 

There already exists a mature theory which was developed precisely in or- 
der to study systems like (1.1). The theory in question is of course that 
of random dynamical systems (RDS in the sequel), which was introduced 
under this name in the nineties by Arnold and then developed further by 
a number of authors, in particular Caraballo, Crauel, Debussche, Flandoli, 



1 Recall that a probability measure n is ergodic for a map T leaving fi invariant if all 
T- invariant measurable sets are of /u,- measure or 1. 
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Robinson, SchmalfuB, and many others. Actually, skew-products of flows 
had been considered by authors much earlier, see for example (Sacker Sz 
Sell 1973, Sacker & Sell 1977), or the monograph (Kifer 1986), but previous 
authors usually made very restrictive assumptions on the structure of the 
noise, either having independent noises at each step or some periodicity or 
quasi-periodicity. We refer to the monograph (Arnold 1998) for a thorough 
exposition of the theory, but for the sake of completeness, we briefly recall 
the main framework here. For simplicity, and in order to facilitate com- 
parison with the alternative framework presented in this article, we restrict 
ourselves to the case of discrete time. An RDS consists of a dynamical sys- 
tem (fi, P, 0), (here P is a probability on the measurable space Q which is 
both invariant and ergodic for the map Q:0, — > Q) together with a 'state 
space' X and a map x X — > X. For every initial condition x G X, 
this allows to construct a stochastic process X n over (fi, P), viewed as a 
probability space, by 

X (u) = x , X n+1 (co) = ${& n u, X n (uj)) . 

Note the similarity with (1.2). The main difference is that the evolution 
on the 'noise space' £1 is deterministic. This means that an element of 0, 
must contain all possible information on the future of the noise driving the 
system. In fact, one can consider an RDS as a dynamical system $ over the 
product space Q x X via 

$(x,u) = (Ouj,<b(x,uj)) . 

The stochastic process X n is then nothing but the projection on /f of a 
'typical' orbit of <3?. An invariant measure for an RDS (f2, P,@,X,&) is a 
probability measure [i on f2 x X which is invariant under $ and such that 
its marginal on Q is equal to P. It can be shown (Arnold 1998) that such 
measures can be described by their 'disintegration' over (fi, P), which is a 
map from 0, into the set of probability measures on X. 

Consider the example of an elliptic diffusion Xt on a compact manifold 
A4. Let us be even more concrete and take for X t a simple Brownian motion 
and for A4 the unit circle S 1 . When considered from the point of view of the 
Markov semigroup Vt generated by Xt, it is straightforward to show that 
there exists a unique invariant probability measure for Vt- In our example, 
this invariant measure is of course simply the Lebesgue measure on the circle. 

Consider now Xt as generated from a random dynamical system (since we 
focus on the discrete time case, choose t to take integer values). At this stage, 
we realise that we have a huge freedom of choice when it comes to finding an 
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underlying dynamical system (fl, P, 0) and a map Q x S 1 — > S 1 such that 
the corresponding stochastic process is equal to our Brownian motion Xt. 
The most immediate choice would be to take for Q the space of real-valued 
continuous functions vanishing at the origin, P equal to the Wiener measure, 
9 the shift map (0S)(s) = B(s + 1) - B(l), and B) = x + 5(1). In 
this case, it is possible to show that there exists a unique invariant measure 
for this random dynamical systems and that this invariant measure is equal 
to the product measure of P with Lebesgue measure on S 1 , so that fj, u is 
equal to the Lebesgue measure for every uj. 

However, we could also have considered X t as the solution to the stochas- 
tic differential equation 

dX(t) = sin(kX(t)}dB 1 (t) + cos(kX(t))dB 2 {t) , 

where B\ and B 2 are two independent Wiener processes and k is an arbitrary 
integer. In this case, it turns out (Le Jan 1987, Crauel 2002a) that there are 
two invariant measures fi + and fi~ for the corresponding random dynamical 
system. Both of them are such that, for almost every uj, [i w is equal to 
a sum of k 5-measures of weights 1/k. Furthermore, the map uj i— > fj,~ is 
measurable with respect to the filtration generated by the increments of 
Bi(t) for negative t, whereas u i— > is measurable with respect to the 
filtration generated by the increments of Bi(t) for positive t. 

What this example makes clear is that while the ergodic theory of X t 
considered as a Markov process focuses on the long-time behaviour of one 
instance of X t started at an arbitrary but fixed initial condition, the the- 
ory of random dynamical systems instead focuses on the (potentially much 
richer) simultaneous long-time behaviour of several instances of X t driven 
by the same instance of the noise. Furthermore, it shows that a random 
dynamical system may have invariant measures that are 'unphysicaP in the 
sense that they can be realised only by initialising the state of our system 
in some way that requires clairvoyant knowledge of the entire future of its 
driving noise. 

In the framework presented in the previous section, such unphysical in- 
variant measures never arise, since our noise space does only contain in- 
formation about the 'past' of the noise. Actually, given a skew-product of 
Markov processes as before, one can construct in a canonical way a ran- 
dom dynamical system by taking 12 = W z , the shift map, and P the 
measure on Q = W z corresponding to the law of the stationary Markov 
process with transition probabilities V and one-point distribution P. The 
map $ is then given by Q(x,lu) = &(x,u)q). With this correspondence, an 
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invariant measure for the skew-product yields an invariant measure for the 
corresponding random dynamical system but, as the example given above 
shows, the converse is not true in general. 

1.4 Ergodicity criteria for Markov semigroups 

Consider a Markov transition kernel V on some Polish space X . Recall 
that an invariant measure fi for V is said to be ergodic if the law of the 
corresponding stationary process is ergodic for the shift map. It is a well- 
known fact that if a Markov transition kernel has more than one invariant 
measure, then it must have at least two of them that are mutually singular. 
Therefore, the usual strategy for proving the uniqueness of the invariant 
measure for V is to assume that V has two mutually singular invariant 
measures (i and v and to arrive at a contradiction. 

This section is devoted to the presentation of some ergodicity criteria for 
a Markov process on a general state space X and to their extension to the 
framework presented in Section 1.2. If X happens to be countable (or finite), 
the transition probabilities are given by a transition matrix P = (Pij) (Pij 
being the probability of going from i to j in time 1) and there is a very 
simple characterisation of those transition probabilities that can lead to at 
most one invariant probability measure. In a nutshell, ergodicity is implied 
by the existence of one point which cannot be avoided by the dynamic: 

Proposition 1.4.1. Let P be a transition matrix. If there exists a state 
j such that, for every i, Yl n >i(P n )ij > ®> then P can have at most one 
invariant probability measure. Conversely, if P has exactly one invariant 
probability measure, then there exists a state j with the above property. 

There is no such clean criterion available in the case of general state 
space, but the following comes relatively close. Recall that a Markov transi- 
tion operator P over a Polish space X is said to be strong Feller if it maps the 
space of bounded measurable functions into the space of bounded continu- 
ous functions. This is equivalent to the continuity of transition probabilities 
in the topology of strong convergence of measures. With this definition, one 
has the following criterion, of which a proof can be found for example in 
(Da Prato k Zabczyk 1996): 

Proposition 1.4.2. Let P be a strong Feller Markov transition operator on 
a Polish space X . If jjl and v are two invariant measures for P that are 
mutually singular, then supp/t/ n suppz/ = 0. 
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This is usually used together with some controllability argument in the 
following way: 

Corollary 1.4.3. Let P be strong Feller. If there exists x such that x belongs 
to the support of every invariant measure for P, then P can have at most 
one invariant measure. 

Remark 1.4.4. The importance of the strong Feller property is that it allows 
to replace a measure-theoretical statement (fi and v are mutually singular) 
by a stronger topological statement (the topological supports of fi and v are 
disjoint) which is then easier to invalidate by a controllability argument. 
If one further uses the fact that /i and v are invariant measures, one can 
actually replace the strong Feller property by the weaker asymptotic strong 
Feller property (Hairer & Mattingly 2006), but we will not consider this 
generalisation here. 

A version with slightly stronger assumptions that however leads to a 
substantially stronger conclusion is usually attributed to Doob and Khas- 
minsk'ii: 

Theorem 1.4.5. Let P be a strong Feller Markov transition operator on a 
Polish space X . If there exists n > 1 such that, for every open set A C X 
and every x £ X, one has P n (x,A) > 0, then the measures P m (x,-) and 
P m (y, • ) are equivalent for every pair (x,y) £ X 2 and for every m > n. In 
particular, P can have at most one invariant probability measure and, if it 
exists, it is equivalent to P n+1 (x, • ) for every x. 

These criteria suggest that we should look for a version of the strong 
Feller property that is suitable for our context. Requiring Q to be strong 
Feller is a very strong requirement which will not be fulfilled in many cases 
of interest. On the other hand, since there is nothing like a semigroup on X, 
it is not clear a priori how the strong Feller property should be translated 
to our framework. On the other hand, the ultra Feller property, that is 
the continuity of the transition probabilities in the total variation topology 
is easier to generalise to our setting. Even though this property seems 
at first sight to be stronger than the strong Feller property (the topology 
on probability measures induced by the total variation distance is strictly 
stronger than the one induced by strong convergence), it turns out that 
the two are 'almost' equivalent. More precisely, if two Markov transition 
operators P and Q are strong Feller, then PQ is ultra Feller. Since this fact 
is not easy to find in the literature, we will give a self-contained proof in 
Appendix 1.6 below. 
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One possible generalisation, and this is the one that we will retain here, 
is given by the following: 

Definition 1.4.6. A skew-product (W, P, V, X, <E>) is said to be strong Feller 
if there exists a measurable map £: W x X 2 — > [0, 1] such that, for P -almost 
every w one has £(w, x, x) = for every x, and such that 

\\S(x,w) - <%,«;) ||tv < £(w,x,y) , (1.5) 

for every w € W and every x,y € X. 

If we are furthermore in the setting of Section 1.2.3, we assume that, 
for P -almost every w, the map (w 1 , x, y) i— > £(w U w', x, y) with w' € Wo is 
jointly continuous. 

If we are not in that setting, we impose the stronger condition that £ is 
jointly continuous. 

A natural generalisation of the topological irreducibility used in Theo- 
rem 1.4.5 is given by 

Definition 1.4.7. A skew-product (W, P, V, X, is said to be topologically 
irreducible if there exists n > 1 such that 

Q n (x,w;Ax W) > , 

for every x £ X , P -almost every w £ W, and every open set A C X . 

According to these definitions, the example given in Section 1.2.2 is both 
strong Feller and topologically irreducible. If p ^ ^, it does however have 
two distinct (even up to the equivalence relation ~) invariant measures. The 
problem is that in the non-Markovian case it is of course perfectly possible 
to have two distinct ergodic invariant measures for Q that are such that 
their projections on X are not mutually singular. This shows that if we are 
aiming for an extension of a statement along the lines of Theorem 1.4.5, we 
should impose some additional condition, which ideally should always be 
satisfied for Markovian systems. 

1.4.1 Off-white noise systems 

In order to proceed, we consider the measure P on W z , which is the law 
of the stationary process with transition probabilities V and fixed-time law 
P. We define the coordinate maps IL;: W z — > VV in the natural way and the 
shift map O satisfying TliOw = IIj + iu/. We also define two natural u-fields 
on W z . The past, , is defined as the u-field generated by the coordinate 
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maps Hi for i < 0. The future, is defined as the cr-field generated by 
all the maps of the form w i— ► <&(x,TLiw) for x G and i > 0. With these 
definitions, we see that the process corresponding to our skew-product is 
Markov (which is sometimes expressed by saying that the system is a 'white 
noise system') if & and & are independent under P. 

A natural weakening of the Markov property is therefore given by: 

Definition 1.4.8. A skew-product (W,P,V,X,&) is said to be an off-white 
noise system if there exists a probability measure Po that is equivalent to P 
and such that & and & are independent under FV 

Remark 1.4.9. The terminology "off-white noise system" is used by anal- 
ogy on the one hand with "white noise systems" in the theory of random 
dynamical systems and on the other hand with "off-white noise" (or "slightly 
coloured noise") as studied by Tsirelson in (Tsirelson 2000, Tsirelson 2002). 

An off-white noise system behaves, as far as ergodic properties are con- 
cerned, pretty much like a white noise (Markovian) system. This is the 
content of the following proposition: 

Theorem 1.4.10. Let (W, P, V, X, <&) be an off-white noise system and let 
fx and v be two stationary measures for Q such that Sfi _L Sv. Then their 
projections H* x n and Tl* x v onto the state space X are also mutually singular. 

Proof. Denote by <£: X x W z — > X N the solution map defined recursively 
by 

($(x, w)) = x , ($(x, w)) n = $(($(x, W^^UnW) . (1.6) 

It follows from the construction that is B(X) <S> ^-measurable, where 
B(X) denotes the Borel cr-algebra of X. Denote now by [x w and u w the 
disintegrations of \i and v over W, that is the only (up to P-negligible sets) 
measurable functions from W to (X) such that 

H(A x B) = [ fi w (A)P(dw) , A G B(X) , B € B(W) , 
Jb 

and similarly for v. Using this, we construct measures fi and i> on X x W z 
by 

fi(A xB)= f mow (A)P(dw) , A G B(X) , B G B(W Z ) . (1.7) 
Jb 
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With these constructions, one has SpL = $*/t and similarly for v. We also 
define /to by the same expression as (1.7) with P replaced by Po- Since 
Po ~ P, one has /t « //q. However, since Ho is ^-measurable and since & 
and & are independent under Po, one has /to ~ IT^/io<S>Po when restricted 
to the u-algebra B(X) <8> This implies in particular that 

Sli = « $*/x = $*(n*/x ® Po) ~ $*(ir^ ® P) , 

which concludes the proof. □ 

Remark 1.4.11. Our definition of off-white noise is slightly more restric- 
tive than the one in (Tsirelson 2002). Translated to our present setting, 
Tsirelson defined ^ n as the o-field generated by all the maps of the form 
w i ^ <&(x,Uiw) for x € X and i > n and a noise was called "off-white" if 
there exists n > and Po ~ P such that & and & n are independent under 
Po- 

With this definition, one could expect to be able to obtain a statement 
similar to Theorem 1.4-10 with the projection on X of a (and v) replaced 
by the projection on the first n + 1 copies of X of the solution Sfi. Such a 
statement is wrong, as can be seen again by the example from Section 1.2.2. 
It is however true if one defines & n as the (larger) a-algebra generated by 
all maps of the form w t— > ($(x, w)) i for x £ X and i > n. 

A consequence of this theorem is the following equivalent of Proposi- 
tion 1.4.2: 

Proposition 1.4.12. Let (W, P, V, X, <£) be an off-white noise system which 
is strong Feller in the sense of Definition 1.4-6. If there exists x € X such 
that x £ suppll^/i for every stationary measure \x of Q, then there can be 
at most one such measure, up to the equivalence relation (1-3). 

Proof. Assume by contradiction that there exist two distinct invariant mea- 
sures ii and v. For simplicity, denote fix = and similarly for v and 
let x be an element from the intersection of their supports (such an x exists 
by assumption). We can assume furthermore without any loss of generality 
that Sll A. Sv. 

Define, with the same notations as in the proof of Theorem 1.4.10, 
S{x--) =¥{5 X ®V) = f S(x,w;-)P(dw) 

and note that one has, as before, S/i ~ J S(x; •) (ix(dx) and similarly for 
v. Since Sii _L Sv, this shows that there exists a measurable set A such 
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that S(x, w;A) = for nx ® P-almost every (x, w) and S{x, w; A) = 1 for 
vx <8> P-almost every (x, 10). Define a function 5: W — ► R+ by 

<5(iu) = inf{<5 : 3xo with d(xo,x) < (5 and 5(xo,w;^4) = 0} , 

were d is any metric generating the topology of X. Since x belongs to the 
support of fxx one must have 5(w) = for P-almost every w. Since we 
assumed that y i— > S(y,w; • ) is continuous, this implies that 5(x, A) = 
for P-almost every w. Reversing the roles of [i and v, one arrives at the 
fact that one also has S(x,w;A) = 1 for P-almost every w, which is the 
contradiction we were looking for. □ 

Remark 1.4.13. It follows from the proof that it is sufficient to assume that 
the map x t— ► S(x, w) is continuous in the total variation norm for P-almost 
every w. 

1.4.2 Another quasi-Markov property 

While the result in the previous section is satisfactory in the sense that 
it shows a nice correspondence between results for Markov processes and 
results for off-white noise systems, it covers only a very restrictive class of 
systems. For example, in the case of continuous time, neither fractional noise 
(the derivative of fractional Brownian motion) nor the Ornstein-Uhlenbeck 
process fall into this class. It is therefore natural to look for weaker condi- 
tions that still allow to obtain statements similar to Theorem 1.4.5. The key 
idea at this stage is to make use of the topology of W which has not been 
used in the previous section. This is also the main conceptual difference 
between the approach outlined in this article and the approach used by the 
theory of random dynamical systems. 

In the previous section, we made use of the fact that for off-white noise 
systems, one has S(x, w; ■ ) s=s S(x, w'; • ) for every x and every pair (w, w') 
in a set of full P-measure. We now consider the set 

A = {(w, w') G W 2 : S(x, w;-)fa S(x, w'; ■ )} , (1.8) 

and we require that the dynamic on W is such that one can construct cou- 
plings that hit A with positive probability. If we think of the driving noise 
to be some Gaussian process, the set A typically consists of pairs (w,w') 
such that the difference w — w' is sufficiently 'smooth'. 

Recall that a coupling between two probability measures fi and v is a 
measure tt on the product space such that its projections on the two factors 
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are equal to \x and v respectively (the typical example is it = \x <g> v but there 
exist in general many different couplings for the same pair of measures). 
Given two positive measures (i and v, we say that ir is a subcoupling for 
\i and v if the projections on the two factors are smaller than jj, and v 
respectively. With this definition at hand, we say that: 

Definition 1.4.14. A skew-product (W,P,V,X, 3>) is said to be quasi-Mar- 
kovian if, for any two open sets U, V C W such that min{P(C7), P(V)} > 0, 
there exists a measurable map w i— > V u,v (w, ■ ) € ^# + (W 2 ) such that: 

i) For P-almost every w, the measure V u,v (w,-) is a subcoupling for 
V(w,-)\u andV(w,-)\v- 

ii) Given A as in (1.8), one has V u,v (w, A) = V u,v (w, W 2 ) forP-almost 
every w. 

Remark 1.4.15. If we are in the setting of Section 1.2.3, this is equivalent 
to considering for U and V open sets in Wo and replacing every occurrence 
ofV by V. The set A should then be replaced by the set 

A = {(wo, w'q) € Wq : S(x, t^U^o; • ) ~ <S(x, wUw' ; ■ ) for P-almost every w} 

Remark 1.4.16. In general, the transition probabilities V only need to be 
defined up to a P -negligible set. In this case, the set A is defined up to a set 
which is negligible with respect to any coupling o/P with itself. In particular, 
this shows that the "quasi-Markov" property from Definition 1.4-14 does not 
depend on the particular choice ofV. 

With these definitions, we have the following result: 

Theorem 1.4.17. Let (W, P,V, X, $) be a quasi- Markovian skew-product 
which is strong Feller in the sense of Definition 1.4-6 and topologically ir- 
reducible in the sense of Definition 1.4-7. Then, it can have at most one 
invariant measure, up to the equivalence relation (1.3). 

Proof. Under slightly more restrictive assumptions, this is the content of 
(Hairer & Ohashi 2007, Theorem 3.10). It is a tedious but rather straight- 
forward task to go through the proof and to check that the arguments still 
hold under the weaker assumptions stated here. □ 
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1.4.3 Discussion 

The insight that we would like to convey with the way of exposing the 
previous two subsections is the following. If one wishes to obtain a statement 
of the form "strong Feller + irreducible + quasi-Markov =>■ uniqueness of the 
stationary measure", one should balance the regularity of £, defined in (1.5), 
as a function of w, with the class of sets U and V used in Definition 1.4.14. 
This in turn is closely related to the size of the set A from (1.8). The larger 
A is, the larger the admissible class of sets in Definition 1.4.14, and the lower 
the regularity requirements on £. 

The off-white noise case corresponds to the situation where A = W 2 . 
This in turn shows that one could take for U and V any two measurable sets 
and V U ' V (w, ■ ) = V(w, ■ )\u <8> P(w, • )|y. Accordingly, there is no regularity 
requirement (in w) on I, except for it being measurable. 

In the case of Section 1.2.3, the transition probabilities V have a special 
structure in the sense that V(w,A) = 1 for A = w U Wo- This implies 
that one can take for U and V any measurable set that is such that, if we 
decompose W according to W ~ W x Wo, the "slices" of U and V in Wo are 
P-almost surely open sets. The corresponding regularity requirement on I 
is that the map (x, y, w') i— > i(w U w', x, y) is jointly continuous for P-almost 
every w. 

Finally, if we do not assume any special structure on A or P, we take for 
U and V arbitrary open sets in W. In this case, the corresponding regularity 
requirement on I is that it is jointly continuous in all of its arguments. 

1.5 The Gaussian case 

In this section, we study the important particular case of Gaussian noise. 
We place ourselves in the framework of Section 1.2.3 and we choose Wo = R, 
so that W = R z ~ . We furthermore assume that the measure P is centred, 
stationary, and Gaussian with covariance C and spectral measure fj,. In 
other words, we define ji as the (unique) finite Borel measure on [— ir,ir] 
such that 



holds for every n > 0. A well-known result by Maruyama, see (Maruyama 
1949) or the textbook (Dym & McKean 1976, Section 3.9), states that P 
is ergodic for the shift map if and only if the measure ji has no atoms. 
As in Section 1.2.3, denote by P:W — ► ^#l(R) the corresponding regular 
conditional probabilities. 




(1.9) 
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Since regular conditional probabilities of Gaussian measures are again 
Gaussian (Bogachev 1998), one has 

Lemma 1.5.1. There exists a > and a P -measurable linear functional 
m: W — ► R suc/t i/tcti, P-almost surely, the measure P(w,-) is Gaussian 
with mean m(w) and variance a 2 . 

This however does not rule out the case where a = 0. The answer to 
the question of when a ^ is given by the following classical result in linear 
prediction theory (Szego 1920, Helson & Szego 1960): 

Theorem 1.5.2. Decompose /x as fi(dx) = f(x) dx + fi s (dx) with fi s singular 
with respect to Lebesgue's measure. Then, one has 

a 2 =exp(±- J log f(x)dx) , (1.10) 

if the expression on the right hand side makes sense and a = otherwise. 

If a = 0, all the randomness is contained in the remote past of the 
noise and no new randomness comes in as time evolves. We will therefore 
always assume that [i is non-atomic and that a 2 > 0. Since in that case 
all elements of W with only finitely many non-zero entries belong to the 
reproducing kernel of P (see Section 1.7 below for the definition of the 
reproducing kernel of a Gaussian measure and for the notations that follow), 
the linear functional m can be chosen such that, for every n > 0, m(w) is 
jointly continuous in (u>- ra , . . . , wo) for P-almost every (. . . , w- n ^2i w-n-i), 
see (Bogachev 1998, Sec. 2.10). 

We will denote by P the Gaussian measure on W = R z with correlations 
given by the C n . We denote its covariance operator again by C. The measure 
P is really the same as the measure P defined in Section 1.4.1 if we make the 
necessary identification of W with a subset of W N , this is why we use the 
same notation without risking confusion. We also introduce the equivalents 
to the two cr-algebras s and & ' . We interpret them as a-algebras on R z , so 
that 2? is the u-algebra generated by the Il n with n < and & is generated 
by the Il n with n > 0. (Actually, ^ could be slightly smaller than that 
in general, but we do not want to restrict ourselves to one particular skew- 
product, and so we simply take for ^ the smallest choice which contains all 
'futures' for all possible choices of $ as in Section 1.2.3.) 

It is natural to split W as VV = W- W + where W- « W is the 
span of the images of the II n with n < and similarly for W+. We denote 
by 7i the reproducing kernel Hilbert space of P. Recall that via the map 
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W* 3 II ra h- > e mx and the inclusion W* C 7Y, one has the isomorphism 
H « L 2 ( / u) with /x as in (1.9), see for example (Dym & McKean 1976). 
Following the construction of Section 1.7, we see that is given by the 
closure in TL of the span of e mx for n < and similarly for 7^. Denote by 
P± the orthogonal projection from H p _ to and by P the corresponding 
operator from TL P _ to 

With all these preliminaries in place, an immediate consequence of Propo- 
sition 1.7.3 is: 

Proposition 1.5.3. Let P: W_ — ► W+ be the P -measurable extension of P. 
Then, the set A is equal (up to a negligible set in the sense of Remark 1.4-16) 
to {(w,w') : P(w — w') G In particular, it always contains the set 

{(w,w') : w-w' G 7YnW_}. 

Proof. The first statement follows from the fact that, by (1.17), Ji\ is the 
reproducing kernel space of the conditional probability of V, given the past 
and P(w — w') yields the shift between the conditional probability given 
w and the conditional probability given w'. The second statement follows 
from the fact that P extends to a bounded operator from TCL to H\. □ 

1.5.1 The quasi-Markov property 

We assume as above that Wo = R and that P is a stationary Gaussian 
measure with spectral measure [i. We also write as before fi(dx) = f{x) dx + 
fi s (dx). The main result of this section is that the quasi-Markov property 
introduced in Section 1.4 can easily be read off from the behaviour of the 
spectral measure /x: 

Theorem 1.5.4. A generic random dynamical system as above is quasi- 
Markovian if and only if f is almost everywhere positive and f*^ jj^ dx is 
finite. 

Proof. Let e n be the 'unit vectors' defined by il m e n = 5 mn . Then the 
condition of f*^ dx being finite is equivalent to e n belonging to the 

reproducing kernel of P, a classical result dating back to Kolmogorov, see 
also (Grenander & Rosenblatt 1957, p. 83). 

To show that the condition is sufficient, denote by D w (x) the (Gaussian) 
density of P(w, ■ ) with respect to Lebesgue measure on R. Given any two 
open sets U and V in R, we can find some x, y, and r > such that B(x, r) C 
U and B(y,r) C V. Take then for V u,v (w, ■ ) the push-forward under the 
map zh (z, z + y — x) of the measure with density z \— > min{D w (z), D w (z + 
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y — x)} with respect to Lebesgue measure. Since, by Proposition 1.5.3, A 
contains all pairs (w,w') which differ by an element of 7i f) W- and since 
the condition of the theorem is precisely what is required for eo to belong 
to H, this shows the sufficiency of the condition. 

To show that the condition is necessary as well, suppose that it does not 
hold and take for example U = (— oo, 0) and V = [0, oo). Since we have the 
standing assumption that a 2 > with a 2 as in (1.10), one has V(w, U) > 
for P-almost every w and similarly for V. Assume by contradiction that the 
system is quasi-Markovian, so we can construct a measure on R z x R z in 
the following way. Define W+ as before and define W_ as the span of YL n 
for n < so that W = W- © W © W+. Let P±: W_ © W ->■ Jti{W+) be 
the conditional probability of P given W- © W - Let T 7 ' 7 ^: >V_ -> ^i(W ) 
be as in Definition 1.4.14 and construct a measure M on W 2 , x W\ by 

M (Ai x A 2 x Bi x B 2 ) = / / P±{w- U w , B 1 )P±(w- U B 2 ) 

J AinA 2 J A 

x ^^(w-^wodwoJP^w-) • 
This measure has the following properties: 

1. By the properties of V U ' V and by the definition of P±, it is a subcou- 
pling for the projection of P on W_ x W+ with itself, and it is not the 
trivial measure. 

2. Denote by Mi and M 2 the projections Mi and M 2 on the two copies 
of Wi x W+. Since P±(w_ U^o,-) ~ P±(w_Uwq, •) for P-almost every 
w and for every pair (wo,w[) € A, one has Mi « M 2 . 

On the other hand, since eo does not belong to the reproducing kernel of 
P by assumption, there exists a P-measurable linear map m: W_ x W+ — > Wo 
such that the identity wo = m(w-,w + ) holds for P-almost every triple 
(w-,wq, w+), see Proposition 1.7.3. Denote by A the preimage of U under m 
in W- x W + and by A c its complement. Then one has M\(A C ) = M 2 (A) = 0, 
which contradicts property 2 above. □ 

Note that although the condition of this theorem is easy to read off 
from the spectral measure, it is in general not so straightforward to read off 
from the behaviour of the correlation function C. In particular, it does not 
translate into a decay condition of the coefficients C n . Take for example the 

case 

( 2 ifn = 0, 
C n = I 1 if n = 1, 
^ otherwise. 
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This can be realised for example by taking for £ n a sequence of i.i.d. normal 
Gaussian random variables and setting W n = £ n + £ n +i- We can check that 
one has, for every N > 0, the identity 

N N 
n=l n=l 

Since the first term converges to almost surely by the law of large numbers, 
it follows that one has the almost sure identity 

1 N 

W = - lim - V(-l) n (iV + 1 - n){W n + W. n ) , 

Tv^oo iv z — * 
n=l 

which shows that Wo can be determined from the knowledge of the W n for 
In terms of the spectral measure, this can be seen from the fact that 
f(x) = 1 + cos(x), so that 1/ f has a non-integrable singularity at x = ir. 
This also demonstrates that there are cases in which the reproducing kernel 
of P contains all elements with finitely non-zero entries, even though the 
reproducing kernel of P contains no such elements. 

1.5.2 The strong Feller property 

It turns out that in the case of discrete stationary Gaussian noise, the quasi- 
Markov and the strong Feller properties are very closely related. In this 
section, we assume that we are again in the framework of Section 1.2.3, but 
we take Wo = R d and we assume that the driving noise consists of d in- 
dependent stationary Gaussian sequences with spectral measures satisfying 
the condition of Theorem 1.5.4. 

We are going to derive a criterion for the strong Feller property for the 
Markovian case where the driving noise consists of d independent sequences 
of i.i.d. Gaussian random variables and we will see that this criterion still 
works in the quasi-Markovian case. 

It will be convenient for the purpose of this section to introduce the 
Frechet space L r (R d ) consisting of measurable functions /:R d — ► R such 
that the norms ||/||7,p = / f p (w)e~ 1 ^ dw are finite for all 7 > and all 
p > 1. For example, since these norms are increasing in p and and decreasing 
in 7, L r (R d ) can be endowed with the distance 

00 00 

p=l n=l 
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With this notation, we will say that a function g: R" xR 11 ^ R m belongs 
to C°' r (R™ x R d ) if, for every i G {1, ...,m}, the map x i-> gi(x,-) is 
continuous from R™ to L r (R d ). 

Given a function $:R"xR d ^R" with elements of R ra denoted by x 
and elements of R d denoted by w, we also define the "Malliavin covariance 
matrix" of <E> by 

d 

M®(x,w) = ^^d w k$i(x,w) d w k&j(x,w) . 
k=l 

With these notations, we have the following criterion: 

Proposition 1.5.5. Let <E> G C 2 (R n x R d ;R") be such that the derivatives 
D w <&, D W D X &, and all belong to C°' r (R™ x R d ). Assume furthermore 
that Mfj is invertible for Lebesgue- almost every (x,w) and that (detM*)" 1 
belongs to C 0,r (R n x K d ). Then, the Markov semigroup over H d defined by 

(Vf)(x)= I f(<Z>(x,w))T(dw) , 

where T is an arbitrary non- degenerate Gaussian measure on Yi d , has the 
strong Feller property. 

Proof. Take a function / G Cq° (R™) and write (in this proof we use Einstein's 
convention of summation over repeated indices) : 

(diPf)(x)= f djfi^w^d^x^Tidw) . (1.11) 

JR d 

At this point, we note that since we assumed M* to be invertible, one has 
for every pair the identity 

d Xi $j{x,w) = d W Tn$j(x,w)Z mi (x,w) . (1-12) 

where 

^mi = d w m$ k (x,w)(M®(x,w))Tld Xi ® e (x,w) 
This allows to integrate (1.11) by parts, yielding 

(diPf)(x) = - [ f(*(x, w)){d w m - (Qw) m )E mi (x, w) T(dw) , 

where Q is the inverse of the covariance matrix of V. Our assumptions 
then ensure the existence of a continuous function K: R™ — > R such that 
\{diPf){x)\ < K (x) sup y \ f(y)\ which, by a standard approximation argu- 
ment (Da Prato & Zabczyk 1996, Chapter 7), is sufficient for the strong 
Feller property to hold. □ 
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Remark 1.5.6. We could easily have replaced R" by an n- dimensional Rie- 
mannian manifold with the obvious changes in the definitions of the various 
objects involved. 

Remark 1.5.7. Just as in the case of the Hormander condition for the hy- 
poellipticity of a second-order differential operator, the conditions given here 
are not far from being necessary. Indeed, if M®(x, ■) fails to be invertible 
on some open set in R d , then the image of this open set under <3?(x, • ) will 
be a set of dimension n' < n. In other words, the process starting from x 
will stay in some subset of lower dimension n' with positive probability, so 
that the transition probabilities will not have a density with respect to the 
Lebesgue measure. 

Remark 1.5.8. Actually, this condition gives quite a bit more than the 
strong Feller property, since it gives local Lipschitz continuity of the transi- 
tion probabilities in the total variation distance with local Lipschitz constant 



We now show that if we construct a skew-product from <E> and take as 
driving noise d independent copies of a stationary Gaussian process with 
a covariance structure satisfying the assumption of Theorem 1.5.4, then 
the assumptions of Proposition 1.5.5 are sufficient to guarantee that it also 
satisfies the strong Feller property in the sense of Definition 1.4.6. We have 
indeed that: 

Theorem 1.5.9. Let W = H d ,W = Wq~ , let $:R n x W -> R n satisfy 
the assumptions of Proposition 1.5.5, and let P £ ^#i(W) be a Gaussian 
measure such that there exist measures . . . ,fid with 



Then, if the absolutely continuous part of each of the fij satisfies the con- 
dition of Theorem 1.5.4, the skew-product (W, P, V, R n , 3>) has the strong 
Feller property. 

Proof. Let m be as in Lemma 1.5.1, let x £ R ra , and let w € W such that 
m(w) < oo. We want to show that there exists a continuous function K 
depending continuously on x and on m(w) such that S(x, w;-) is locally 
Lipschitz continuous in the total variation distance (as a function of x) with 
local Lipschitz constant K(x,m(w)). Since we assumed from the beginning 
that a 2 > 0, where a is defined as in (1.10), we know that the set of el- 
ements in W with only finitely many non-zero coordinates belongs to the 



K(x). 
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reproducing kernel space of P. Since, by (1.16), the map m is bounded from 
the reproducing kernel space of P into R, m(w) depends continuously on 
each of the coordinates of w and so the assumptions of Definition 1.4.6 are 
verified. 

It remains to construct K. This will be done in a way that is almost 
identical to the proof of Proposition 1.5.5. Take a bounded smooth test 
function /: (R") N — > R which depends only on its first iV coordinates and 
consider the function (Sf)(x,w) defined by 

(Sf)(x,w)= f(y)S(x,w;dy) . 

J(R")N 

Consider now the splitting W = WL © W+ , as well as the measurable linear 
map P and the space introduced for the statement of Proposition 1.5.3 
(note that P relates to m via (Pw)q = m(w)). Denote furthermore by P + 
the Gaussian measure on W + with reproducing kernel space H c + . With these 
notations at hand, we have the expression 

(Sf)(x, w)= f f(*(x, w + Pw))P+(dw) , 

where we denoted by $: X x W+ -» X N the map defined in (1.6). We see 
that, as in (1.12), one has the identity (again, summation over repeated 
indices is implied): 

d Xi $(x,w) = du,™&(x,w)Z mi (x,wo) , 

where the function S is exactly the same as in (1.12). At this point, since the 
'coordinate vectors' e™ belong to the reproducing kernel of P, and therefore 
also of P + by (1.15), we can integrate by parts against the Gaussian measure 
P+ (Bogachev 1998, Theorem 5.1.8) to obtain 

diSf{x,w) = - I f(%(x,w + Pw))d^E mi (x,w +m(w))P + (dw) 

+ / /(4>(x, w + Pw))E mi (x, w + m(w))e^(w) P+(dw) . 

Here, we made an abuse of notation and interpreted as a measurable lin- 
ear functional on W+, via the identification (1.14). Since f \e™(w)\ 2 P+(dw) < 
oo by assumption and since the law of wq under P + is centred Gaussian with 
variance a 2 , this concludes the proof. □ 
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1.5.3 The off-white noise case 

The question of which stationary Gaussian sequences correspond to off- 
white noise systems was solved by Ibragimov and Solev in the seventies, 
see (Ibragimov Sz Rozanov 1978) and also (Tsirelson 2002). It turns out 
that the correct criterion is: 

Theorem 1.5.10. The random dynamical system is off-white if and only 
if the spectral measure [i has a density f with respect to Lebesgue measure 
and /(A) = exp0(A) for some function (p belonging to the fractional Sobolev 
space 

Remark 1.5.11. It follows from a well-known result by (Trudinger 1967), 
later extended in (Strichartz 1971/72), that any function <f> € H 1 / 2 satisfies 
f exp(c/> 2 (x)) dx < oo. In particular, this shows that the condition of the pre- 
vious theorem is therefore much stronger than the condition of Theorem 1.5.4 
which is required for the quasi- Markov property. 

As an example, the (Gaussian) stationary autoregressive process, which 
has covariance structure C n = a n does have the quasi-Markov property since 
its spectral measure has a density of the form 

I -a 2 

^= l + a 2 -2acos(x) dX ' 

which is smooth and bounded away from the origin. However, if we take a 
sequence £ n of i.i.d. normal random variables and define a process X n by 

oo 
k=l 

for some (3 > 1/2, then X n does still have the quasi-Markov property, but 
it is not an off-white noise. 

1.6 Appendix A: Equivalence of the strong and 
ultra Feller properties 

In this section, we show that, even though the ultra Feller property seems at 
first sight to be stronger than the strong Feller property, the composition of 
two Markov transition kernels satisfying the strong Feller property always 
satisfies the ultra Feller property. This fact had already been pointed out 
in (Dellacherie & Meyer 1983) but had been overlooked by a large part of 
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the probability community until Seidler 'rediscovered' it in 2001 (Seidler 
2001). We take the opportunity to give an elementary proof of this fact. 
Its structure is based on the notes by Seidler, but we take advantage of the 
simplifying fact that we only work with Polish spaces. 
We introduce the following definition: 

Definition 1.6.1. A Markov transition kernel P over a Polish space X 
satisfies the ultra Feller property if the transition probabilities P(x, • ) are 
continuous in the total variation norm. 

Recall first the following well-known fact of real analysis, see for example 
(Yosida 1995, Example IV.9.3): 

Proposition 1.6.2. For any measure space (f2, J 7 , A) such that T is count- 
ably generated and any p € [l,oo) ; one has L P (Q,X)' = L 9 (J7, A) with 
q^ 1 + p^ 1 = 1. In particular, this is true with p = 1. 

As a consequence, one has 

Corollary 1.6.3. Assume that T is countably generated and let g n be a 

bounded sequence in L°°(fi, A). Then there exists a subsequence g nk and an 
element g € L°°(Q,X) such that J g nk (x)f(x) \(dx) — > J g(x)f(x)X(dx) for 
every f € A). 

Proof. Since T is countably generated, A) is separable and therefore 

contains a countable dense subset {/ m }. Since the g n are uniformly bounded, 
a diagonal argument allows to exhibit an element g <G L 1 (0, A)' and a subse- 
quence rife such that J g nk (x)f m (x) X(dx) — ► {f m ,g) for every m. The claim 
follows from the density of the set {f m } and the previous proposition. □ 

Note also that one has 

Lemma 1.6.4. Let P be a strong Feller Markov kernel on a Polish space 
X. Then there exists a probability measure X on X such that P(x,-) is 
absolutely continuous with respect to X for every x G X . 

Proof. Let {x n } be a countable dense subset of X and define a probability 
measure A by X(A) = Ylrv=i 2~ n P( x n, A). Let x £ X be arbitrary and as- 
sume by contradiction that P(x, ■ ) is not absolutely continuous with respect 
to A. This implies that there exists a set A with X(A) = but P(x, A) / 0. 
Set / = xa and consider Pf. One one hand, (Pf)(x) = P(x,A) > 0. On 
the other hand, (Pf)(x n ) = for every n. Since P is strong Feller, Pf must 
be continuous, thus leading to a contradiction. □ 
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Finally, to complete the preliminaries, set B = {g G Bb(X) | sup x |<7(a:)| < 
1} the unit ball in the space of bounded measurable functions, and note that 
one has the following alternative formulation of the ultra Feller property: 

Lemma 1.6.5. A Markov kernel P on a Polish space X is ultra Feller if 
and only if the set of functions {Pg \ g G B} is equicontinuous. 

Proof. This is an immediate consequence of the fact that one has the char- 
acterisation \\P(x, ■ ) — P(y, ■ )||tv = su P s gb \Pq{ x ) — Pg{v)\i see f° r example 
(Villani 2003, Example 1.17). □ 

We have now all the ingredients necessary for the proof of the result 
announced earlier. 

Theorem 1.6.6. Let X be a Polish space and let P and Q be two strong 
Feller Markov kernels on X. Then the Markov kernel PQ is ultra Feller. 

Proof. Applying Lemma 1.6.4 to Q, we see that there exists a reference 
measure A such that Q(y, dz) = k(x, z) \{dz). 

Suppose by contradiction that R = PQ is not ultra Feller. Therefore, by 
Lemma 1.6.5 there exists an element x G X, a sequence g n G B, a sequence 
x n converging to x, and a value 5 > such that 

Rg n (x n ) - Rg n {x) > 5 , (1.13) 

for every n. Interpreting the g n 's as elements of L°°(X, A), it follows from 
Corollary 1.6.3 that, extracting a subsequence if necessary, we can assume 
that there exists an element g € L°°(X, A) such that 

lim Qg n {y) = lim / k(y, z)g n {z) \{dz) = / k(y, z)g(z) X(dz) = Qg(y) 

n-^oo n^oo J J 

for every y G X. (This is because k(y,-) G L 1 (A',A).) Let us define the 
shorthands f n = Qg n , f = Qg, and h n = sup m > n \f m - f\. 

Since f n — > / pointwise, it follows from Lebesgue's dominated conver- 
gence theorem that Pf n {x) — > Pf(x). The same argument shows that 
Ph n {y) — * for every y G X. Since furthermore the h^s are positive de- 
creasing functions, one has 

lim Ph n (x n ) < lim Ph m (x n ) = Ph m (x) , 

which is valid for every m, thus showing that lim^oo Ph n (x n ) = 0. This 
implies that 

lim Pf n (x n ) - Pf(x) < lim \Pf n (x n ) - Pf(x n )\ + lim \Pf(x n ) - Pf(x)\ 
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< lim Ph n (x n ) + = , 



n— too 



thus creating the required contradiction with (1.13). 



□ 



Example 1.6.7. Let us conclude this section with an example of a strong 
Feller Markov kernels which is not ultra Feller. Take X = [0, 1] and define 



Here, the function c is chosen in such a way that P(x, • ) is a probability 
measure. It is obvious that, for any f E B^X), Pf is continuous (even 
C°°) outside of x = 0. It follows furthermore from the Riemann-Lebesgue 
lemma that Pf is continuous at x = 0. However, the map x t— > P(x, ■ ) is 
discontinuous at in the total variation topology (one has linx^o \\P(x, ■ ) — 
P(0, • )||tv = § ), which shows that P is not ultra Feller. 

Remark 1.6.8. Since, as seen in the previous example, there are strong 
Feller Markov kernels that are not ultra Feller, Theorem 1.6.6 fails in general 
if one of the two kernels is only Feller (take the identity). 

1.7 Appendix B: Some Gaussian measure theory 

This section is devoted to a short summary of the theory of Gaussian mea- 
sures and in particular on their conditioning. Denote by X some separable 
Frechet space and assume that we are given a splitting X = X\ © X2 . This 
means that the Xj are subspaces of X and every element of X can be written 
uniquely as x = x\ + £2 with Xi £ X, and the projection maps Ilj: x 1— > Xi 
are continuous. 

Assume that we are given a Gaussian probability measure P on X, with 
covariance operator Q. That is Q:X* — > X is a continuous bilinear map 
such that (Qf,g) = J f(x)g(x) P(dx) for every / and g in X*. (Such a map 
exists because P is automatically a Radon measure in our case.) Here, we 
used the notation (/, x) for the pairing between X* and X. We denote by H 
the reproducing kernel Hilbert space of P. The space 7i can be constructed 
as the closure of the image of the canonical map 1: X* — > L 2 (X, P) given by 
(ih)(w) = h(w), so that H is the space of P-measurable linear functionals on 
X. If we assume that the support of P is all of X (replace X by the support 
of P otherwise), then this map is an injection, so that we can identify X* 
with a subspace of TL. Any given h G H can then be identified with the 
(unique) element h* in X such that (Qg,h) = g(h*) for every g € X*. 



P by 




dy if x = 0, 

c(x)(l + sin(y/x)) dy otherwise. 
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With this notation, the scalar product on H is given by (th,w) = h(w), or 
equivalently by (ih, ig) = {h, Qg) = (Qh, g). We will from now on use these 
identifications, so that one has 

X* C H C X , (1.14) 

and, with respect to the norm on H, the map Q is an isometry between X* 
and its image. For elements x in the image of Q (which can be identified 
with a dense subset of H), one has ||x|| 2 = (x,Q~ 1 x). 

Given projections Hf.X — ► Xj as above, the reproducing kernel Hilbert 
spaces of the projected measures P o II" 1 are given by = HiH C Xj, 
and their covariance operators Qi are given by Qi = HiQTl*: X* — > Xj. The 
norm on is given by 

\\x\\l p = inf{||y|| 2 : x = YLiy , y G H} = {x^Q^x) , 

where the last equality is valid for x belonging to the image of Qi. It 
is noteworthy that even though the spaces are not subspaces of H in 
general, there is a natural isomorphism between 7i? some closed subspace of 
H in the following way. For x in the image of Qi, define UiX = QHlQ~ 1 x G 
X. One has 

Lemma 1.7.1. For every x in the image of Qi, one has UiX G 7i. Further- 
more, the map Ui extends to an isometry between and UiTL? C H. 

Proof. Since XJ^x belongs to the image of Q by construction, one has ||C/ix|| 2 = 
(U iX ,Q- l Uix) = (QU^Q^x^Q^x) = {x^Q^x) = ||x|| 2 p . The claim fol- 
lows from the fact that the image of Qi is dense in H?. □ 

We denote by the images of under Ui. Via the identification 
(1.14), it follows that is actually nothing but the closure in H of the 
image of X* under IT*. Denoting by tii'.H —>■ H the orthogonal projection 
(in 7i) onto 7Y^, it is a straightforward calculation to see that one has the 
identity YiiX = UiYLiX. On the other hand, it follows from the definition of 
Qi that ILiUiX = x, so that Tl^Ti^ — > Tlf is the inverse of the isomorphism 
Ui. 

We can also define subspaces Hf of H by 

ui = n n x = n n n\ , (1.15) 

where we used the identification (1.14) and the embedding Xj C X. The 
closures are taken with respect to the topology of 7i. The spaces are again 
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Hilbert spaces (they inherit their structure from H, not from !) and they 
therefore define Gaussian measures Pj on Xi. Note that for x £ Til n TL\ , 
one has ||x|| > ||x||j jP , so that the inclusion Hf C holds. One has 

Lemma 1.7.2. One has = (H^) -1 and vice-versa. 

Proof. It is an immediate consequence of the facts that X = X\ © X2 , that 
nP x is the closure of the image of Yi\ and that, via the identification (1.14) 
the scalar product in 7i is an extension of the duality pairing between X 
and X*. □ 

We now define a (continuous) operator P:TL^ — > by Px = H-2U1X. 
It follows from the previous remarks that P is unitarily equivalent to the 
orthogonal projection (in H) from 7-q to "rv 2 . Furthermore, one has Px = 
U\x — x, so that 

ll-P^Hw < ||x|| w p + ||x|| , (1-16) 

which, combined with (1.15), shows that P can be extended to a bounded 
operator from 7i\ to TC^. 

A standard result in Gaussian measure theory states that P can be 
extended to a (P o II^ 1 )-measurable linear operator P:X\ — > X2. With 
these notations at hand, the main statement of this section is given by: 

Proposition 1.7.3. The measure P admits the disintegration 

[^(x)P(dx)=[ j <j)(x + Px + y)P 2 (dy)(PoU^ l )(dx) . (1.17) 

Proof. Denote by v the measure on the right hand side. Since v is the 
image of the Gaussian measure fi = (Po n^ 1 ) © P2 under the ^-measurable 
linear operator A: (x, y) 1— > x + Px + y, it follows from (Bogachev 1998, 
Theorem 3.10) that v is again a Gaussian measure. The claim then follows 
if we can show that the reproducing kernel Hilbert space of v is equal to 7i. 
Since the reproducing kernel space H(fi) of fi is canonically isomorphic to 
T~t([J>) = Ti-i © Ti-2 C 7i © Ti, this is equivalent to the fact that the operator 
x 1 — ^ x ~\~ Px = x + H2U1X from TC^ to TC is an isometry between TL\ and 
("^i)" 1 ■ O n the other hand, we know from Lemma 1.7.2 that (H^) 1 ' = 
and we know from Lemma 1.7.1 that U\ is an isomorphism between TL^ and 
lhi\. Finally, it follows from the definitions that IIiLqx = Q\Q7 1 x = x for 
every x £ so that one has x + H2U1X = (IIi + Ii2)U\x = U\x, which 
completes the proof. □ 
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